A German distant speech recognizer based on 3D beamforming and harmonic missing data mask
نویسندگان
چکیده
This paper addresses the problem of distant speech recognition in reverberant noise conditions applying a star-shaped microphone array and missing data techniques. The performance of the system is evaluated over a German database, which has been contaminated with noise of an apartment of the DIRHA (Distant Speech Interaction for Robust Home Applications) project. The proposed system is composed of three blocks. First, a beamformer yields an enhanced single-channel signal by filtering multi-channel signals and summing up all signals afterwards. To optimize the filter weights, we apply convex (CVX) optimization over three spatial dimensions given the spatiotemporal position of the target speaker as prior knowledge. Second, the beamformer output is exploited to extract pitch and estimate the stationary part of the background noise. Third, the system produces a final noise estimate by combining both, the stationary noise part as well as the harmonic noise estimate obtained from the pitch. Finally, the filter-bank representation of the enhanced signal and its corresponding missing data mask obtained from this final noise estimate are sent to the speech recognition back-end. The purpose of this paper is to analyze the impact of employing a beamformer followed by a missing data technique.
منابع مشابه
On binary and ratio time-frequency masks for robust speech recognition
A time-varying Weiner filter extracts the speech signal from a noisy mixture using the a priori signal-to-noise ratio in a local time-frequency unit. We estimate this ratio using a binaural processor and derive a ratio time-frequency mask. This mask is used to extract the speech signal, which is then fed to a conventional speech recognizer operating in the cepstral domain. We compare the perfor...
متن کاملBinary and ratio time-frequency masks for robust speech recognition
A time-varying Wiener filter extracts a speech signal from a mixture using the a priori signal-to-noise ratio in a local time-frequency unit. We estimate this ratio using a binaural processor and derive a ratio time-frequency mask. This mask is used to extract the speech, which is then fed to a conventional speech recognizer operating in the cepstral domain. We compare the performance of this s...
متن کاملBeamforming using uniform circular arrays for distant speech recognition in reverberant environments and double talk scenarios
Beamforming is crucial for hands-free mobile terminals and voice-enabled automated home environments based on distant-speech interaction to mitigate causes of system degradation, e.g., interfering noise sources or competing speakers. This paper presents an adaptation of the most common state-of-the-art broadband beamformers to uniform circular arrays, such that competing speakers are attenuated...
متن کاملVector-quantization based mask estimation for missing data automatic speech recognition
The application of Missing Data Theory (MDT) has shown to improve the robustness of automatic speech recognition (ASR) systems. A crucial part in a MDT-based recognizer is the computation of the reliability masks from noisy data. To estimate accurate masks in environments with unknown, non-stationary noise statistics only weak assumptions can be made about the noise and we need to rely on a str...
متن کاملIterative Group Selection-Based Enhancement of Time-Frequency masks for Missing Data Recognition
Missing data approaches have recently been applied to speech recognition tasks to increase noise robustness. The drawback of missing data techniques is the vulnerability of the recognizer to errors in the reliability mask. This work proposes a novel group selection algorithm to perform top-down re ̄nement of initial bottom-up reliability mask estimates with the goal of removing these errors. A n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013